New partitioner: KD-Tree partitioner #127

abualia4 · 2019-08-08T12:32:47Z

General

New partitioning technique based on balanced KD-tree was developed.

coverKD method is added to BoxEnvelope.scala file, (spark3D/src/main/scala/com/spark3d/geometryObjects/).
Also I made minor modifications as follows:

Partitioner.scala file, (/home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala).
Repartitioning.scala, (/home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala).
GridTypes.scala, (/home/alia/spark3D/src/main/scala/com/spark3d/utils/GridType.scala).
run_viz_scala.sh, to visualize the KD-tree partitioner, (spark3D/).

New added code

The main files for the new Partitioner (KD-tree) are as follows:

KDtree.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
KDtreePartitioning.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
KDtreePartitioning.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
Test.scala, (spark3D/src/main/scala/com/spark3d/examples/)
arun_scala.sh, (spark3D/).

Evaluation of the new code

In order to evaluate the performance of the new spatial partitioning technique which is based on balanced KD-tree, this methodology took empirical approach by comparing it with onion and octree partitioners which were implemented in the spark3D framework, after applying them on two astronomical data sets. These data sets have been generated by CoLoRe fast simulation. The first dataset has more than 37 million 3D points, and data has been randomly drawn along spherical coordinates. The second dataset contains more than seven and a half millions of 3D points, and data has been randomly drawn along Cartesian axes. All implementations were run on the LAL Apache Spark cluster hosted in the virtual data cloud.

Partitioner		Spherical Dataset	Cartesian Dataset
--	Number of objects/points	37,079,120	7,768,425
KD-tree	Standard Deviation (#of objects)	2,536	6,652
KD-tree	Computational Time (Minute)	2.76	1.27
Octree	Standard Deviation (#of objects)	276,712	39,317
Octree	Computational Time (Minute)	3.7	1.45
Onion	Standard Deviation (#of objects)	54,195	38,006
Onion	Computational Time (Minute)	15.68	2.65

…rtitions are received

…ons is added

to continue my work

JulienPeloton · 2020-03-19T12:29:37Z

Hi @abualia4, thanks for the PR. Sorry it took me a long time to review it, but I will add my comments now. Please read them, and implement change if any.
I hope you and your family are safe.
Julien

JulienPeloton

Thanks @abualia4. Could you please have a look at the comments and implement the suggestions for change in order to merge the PR.

JulienPeloton · 2020-03-19T12:31:09Z

run_part_scala.sh

-fitsfn=
+MASTERURL=spark://134.158.75.222:7077
+fitsfn=hdfs://134.158.75.222:8020//lsst/LSST1Y/out_srcs_s1_0.fits
+


The general rule is to not commit URL in a repo. Please keep the previous empty variables:

MASTERURL= fitsfn=

JulienPeloton · 2020-03-19T12:32:47Z

run_viz_scala.sh

    target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
-    $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "username" "api_key"
+    $inputfn "position_x_mock,position_y_mock,position_z_mock" false "kdtree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"


Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?

JulienPeloton · 2020-03-19T12:33:11Z

run_viz_scala.sh

+  #  --jars ${SP},${HP} --packages ${PACK} \
+  # --class com.astrolabsoftware.spark3d.examples.VisualizePart \
+  #  target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
+   #  $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"


Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?

JulienPeloton · 2020-03-19T12:35:03Z

src/main/scala/com/spark3d/examples/test.scala

@@ -0,0 +1,141 @@
+/*


Could you give a meaningful name for this file? test.scala is too vague, and something like TestKDTree.scala would be better.

JulienPeloton · 2020-03-19T12:36:03Z

src/main/scala/com/spark3d/geometryObjects/BoundaryKD.scala

+    var minX: Double, var maxX: Double,
+    var minY: Double, var maxY: Double,
+    var minZ: Double, var maxZ: Double)
+  extends  Serializable {


what is the purpose of this class? it is not used elsewhere, so I would recommend to remove this file from the repo.

JulienPeloton · 2020-03-19T12:39:54Z

src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala

@@ -84,7 +81,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat
    * @return list of Tuple of containing partitions and their index/partition ID's
    */
  override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = {
-
+    println("getPartitionNodes")


Remove unnecessary println

JulienPeloton · 2020-03-19T12:40:13Z

src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioning.scala

@@ -22,7 +22,7 @@ import scala.collection.mutable.ListBuffer

 class OctreePartitioning (private val octree: Octree)
  extends Serializable {
-
+  println("OctreePartitioning")


Remove unnecessary println

JulienPeloton · 2020-03-19T12:40:19Z

src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioner.scala

@@ -40,7 +40,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Point3D
  *
  */
 class OnionPartitioner(grids : List[ShellEnvelope]) extends SpatialPartitioner(grids) {
-
+  println("OnionPartitioner")


Remove unnecessary println

JulienPeloton · 2020-03-19T12:40:37Z

src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioning.scala

@@ -31,7 +31,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Shape3D._
  *
  */
 class OnionPartitioning extends Serializable {
-
+println ("OnionPartitionong")


Remove unnecessary println

JulienPeloton · 2020-03-19T12:41:08Z

src/main/scala/com/spark3d/spatialPartitioning/learn.scala

@@ -0,0 +1,81 @@
+/*


Please, remove this unnecessary file from the repo.

JulienPeloton · 2020-03-19T12:50:46Z

src/main/scala/com/spark3d/examples/test.scala

+    else 
+     println("Please determine the number of partitions as 1, 2, 4, 8 and so on")
+  }
+  else  For others


I do not think else For others is a correct Scala statement... You probably miss a # before For

By the way, this is why the continuous integration is failing.

mayurdb · 2020-03-19T12:50:50Z

I'll also take a tomorrow. Seems awesome at the first glance 👍

…

On Thu, Mar 19, 2020, 6:15 PM Julien ***@***.***> wrote: ***@***.**** requested changes on this pull request. Thanks @abualia4 <https://github.com/abualia4>. Could you please have a look at the comments and implement the suggestions for change in order to merge the PR. ------------------------------ In run_part_scala.sh <#127 (comment)> : > @@ -10,8 +10,9 @@ PACK=com.github.astrolabsoftware:spark-fits_2.11:0.7.1 SF=target/scala-2.11/spark3d_2.11-${VERSION}.jar HP=lib/jhealpix.jar -MASTERURL= -fitsfn= +MASTERURL=spark://134.158.75.222:7077 +fitsfn=hdfs://134.158.75.222:8020//lsst/LSST1Y/out_srcs_s1_0.fits + The general rule is to not commit URL in a repo. Please keep the previous empty variables: MASTERURL= fitsfn= ------------------------------ In run_viz_scala.sh <#127 (comment)> : > target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \ - $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "username" "api_key" + $inputfn "position_x_mock,position_y_mock,position_z_mock" false "kdtree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw" Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key? ------------------------------ In run_viz_scala.sh <#127 (comment)> : > -spark-submit \ - --master ${MASTERURL} \ - --driver-memory 4g --executor-memory 28g --executor-cores 17 --total-executor-cores 102 \ - --jars ${SF},${HP} --packages ${PACK} \ - --class com.astrolabsoftware.spark3d.examples.VisualizePart \ - target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \ - $inputfn "Z_COSMO,RA,DEC" true "onion" 8 0.0001 "username" "api_key" + #inputfn=hdfs://134.158.75.222:8020/user/julien.peloton/dc2 + #spark-submit \ + # --master ${MASTERURL} \ + # --driver-memory 4g --executor-memory 28g --executor-cores 17 --total-executor-cores 102 \ + # --jars ${SP},${HP} --packages ${PACK} \ + # --class com.astrolabsoftware.spark3d.examples.VisualizePart \ + # target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \ + # $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw" Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key? ------------------------------ In src/main/scala/com/spark3d/examples/test.scala <#127 (comment)> : > @@ -0,0 +1,141 @@ +/* Could you give a meaningful name for this file? test.scala is too vague, and something like TestKDTree.scala would be better. ------------------------------ In src/main/scala/com/spark3d/geometryObjects/BoundaryKD.scala <#127 (comment)> : > + * + * Default constructor is kept private to avoid creating an instance of the cube Envelope class without initialising + * the min/max coordinates along axes incorrectly. + * + * @param minX minimum coordinate of cube Envelope along X-axis + * @param maxX maximum coordinate of cube Envelope along X-axis + * @param minY minimum coordinate of cube Envelope along Y-axis + * @param maxY maximum coordinate of cube Envelope along Y-axis + * @param minZ minimum coordinate of cube Envelope along Z-axis + * @param maxZ maximum coordinate of cube Envelope along Z-axis + */ +class BoundaryKD private( + var minX: Double, var maxX: Double, + var minY: Double, var maxY: Double, + var minZ: Double, var maxZ: Double) + extends Serializable { what is the purpose of this class? it is not used elsewhere, so I would recommend to remove this file from the repo. ------------------------------ In src/main/scala/com/spark3d/geometryObjects/BoxEnvelope.scala <#127 (comment)> : > @@ -11,7 +11,7 @@ * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and - * limitations under the License. + * limitations under the License.env Please, remove the .env added mistakenly. ------------------------------ In src/main/scala/com/spark3d/geometryObjects/BoxEnvelope.scala <#127 (comment)> : > @@ -624,23 +627,49 @@ class BoxEnvelope private( * being checked for containment * @param z the z-coordinate of the point for which this cube Envelope is * being checked for containment - * @return true if (x, y, z) lies in the interior or - * on the boundary of this cube Envelope, false if the cube Envelope is null. + * @return true if (x, y, z) lies in the interior of this cube Envelope, false if the cube Envelope is null. Typo - you removed a part of the documentation ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala <#127 (comment)> : > + + + /** + * Get the number of partitions in this partitioning + * + * @return the number of partitions + */ + override def numPartitions: Int = { + grids.size + + } + + /** + * + * + */ Documentation missing ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala <#127 (comment)> : > + + partitionId=0 + for(element<-grids){ + if(element.coversKD(c0,c1,c2)){ + return partitionId + } + partitionId+=1 + } + + 0 + + } + + /** + * + */ Documentation missing ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala <#127 (comment)> : > + } + + 0 + + } + + /** + * + */ + override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = { + null + } + + /** + * + */ Documentation missing ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala <#127 (comment)> : > + * + */ + override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = { + null + } + + /** + * + */ + override def getNeighborNodes[T <: Shape3D](spatialObject: T, inclusive: Boolean = false): List[Tuple2[Int, Shape3D]] = { + null + } + + /** + * + */ Documentation missing ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala <#127 (comment)> : > @@ -44,8 +43,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat * */ override def placeObject[T <: Shape3D](spatialObject: T): Iterator[Tuple2[Int, T]] = { - - val result = HashSet.empty[Tuple2[Int, T]] + val result = HashSet.empty[Tuple2[Int, T]] Incorrect indentation ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala <#127 (comment)> : > @@ -66,8 +64,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat * */ override def placePoints(c0: Double, c1: Double, c2: Double, isSpherical: Boolean) : Int = { - - val spatialObject = new Point3D(c0, c1, c2, isSpherical) + val spatialObject = new Point3D(c0, c1, c2, isSpherical) Incorrect indentation ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala <#127 (comment)> : > @@ -84,7 +81,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat * @return list of Tuple of containing partitions and their index/partition ID's */ override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = { - + println("getPartitionNodes") Remove unnecessary println ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioning.scala <#127 (comment)> : > @@ -22,7 +22,7 @@ import scala.collection.mutable.ListBuffer class OctreePartitioning (private val octree: Octree) extends Serializable { - + println("OctreePartitioning") Remove unnecessary println ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioner.scala <#127 (comment)> : > @@ -40,7 +40,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Point3D * */ class OnionPartitioner(grids : List[ShellEnvelope]) extends SpatialPartitioner(grids) { - + println("OnionPartitioner") Remove unnecessary println ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioning.scala <#127 (comment)> : > @@ -31,7 +31,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Shape3D._ * */ class OnionPartitioning extends Serializable { - +println ("OnionPartitionong") Remove unnecessary println ------------------------------ In src/main/scala/com/spark3d/spatialPartitioning/learn.scala <#127 (comment)> : > @@ -0,0 +1,81 @@ +/* Please, remove this unnecessary file from the repo. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#127 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADBQ6XWPSNHLNUP3DJJVWADRIIHXJANCNFSM4IKJ3HUQ> .

JulienPeloton · 2020-03-19T12:52:36Z

Thanks @mayurdb ! I hope you are in a safe place.

alia added 30 commits July 22, 2019 17:17

First Part

f7c72a8

Second Update

c97e69a

Updated

9f486d1

Added

1d282dc

Added

a6f1b74

KDtree option is added

4382b67

KDtree option is added

82e085a

KDtree is added

a7a385d

KDtree option has been added

ff2086a

Updated

1bf8b37

Updated

bf3a9ee

Updated

195169e

Boundary box problem is fixed

1be165f

The number of partitions is determined, the the boundary boxes for pa…

973bb67

…rtitions are received

placePoints method is ready

be83043

The level of KDtree which uses to find the boundary boxes for partiti…

3c21725

…ons is added

it uses to check the KDtree

f27582b

Validation is addet

6d76411

updated

3de67ec

Validation is added

b2c83b1

Updated

e65baa3

KDtree is added

ac60b6c

it used for develping KD-tree

f119781

containKD methid is added

e3f3a8f

updated

472264d

KDtree is added

e02828e

Bounding box is fixed

5ed8977

Updated

1827df4

nothing

ce2745e

placePoint method for KD-tree is added

0d9004e

alia and others added 19 commits August 5, 2019 12:30

Bounding box is fixed

e7ded79

Merge remote-tracking branch 'upstream/master'

2606485

Nothing

1d54c58

Nothin

0048ce4

Tracing

0f23805

Merge branch 'master' into origin

300d28a

to continue my work

Simple commit test

00a8b3a

Simple commit test

7e6dcae

KDtree partitioner is added

97420ec

Documenting the code

1775a8e

Documenting the code

2189f56

Documenting the code

d9c8b57

Documenting the code

71f15e5

Documenting the code

e4b541e

Documenting the code

1eb1665

Documenting the code

3feb9f7

Documenting the code

9065432

This bash code is used by Ahmed Alia to evaluate his work

57b6705

To perform some visual inspection of the KDtree partitioner

3927c4c

JulienPeloton self-requested a review August 8, 2019 12:33

JulienPeloton requested changes Mar 19, 2020

View reviewed changes

JulienPeloton reviewed Mar 19, 2020

View reviewed changes

JulienPeloton added Partitioning scala labels Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New partitioner: KD-Tree partitioner #127

New partitioner: KD-Tree partitioner #127

abualia4 commented Aug 8, 2019

JulienPeloton commented Mar 19, 2020

JulienPeloton left a comment

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

JulienPeloton Mar 19, 2020

mayurdb commented Mar 19, 2020 via email

JulienPeloton commented Mar 19, 2020

New partitioner: KD-Tree partitioner #127

Are you sure you want to change the base?

New partitioner: KD-Tree partitioner #127

Conversation

abualia4 commented Aug 8, 2019

General

New added code

Evaluation of the new code

JulienPeloton commented Mar 19, 2020

JulienPeloton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayurdb commented Mar 19, 2020 via email

JulienPeloton commented Mar 19, 2020